AITopics | semantic part

Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps.

learning masked autoencoder, semantic-guided masking, semmae, (7 more...)

Neural Information Processing Systems

Country: Asia > China > Heilongjiang Province > Daqing (0.07)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Artificial Intelligence > Natural Language (0.62)

Add feedback

f6adf61977467560f79b95485d1f3a79-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 20:38:19 GMT

artificial intelligence, machine learning, progrip, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.70)

Add feedback

db6caae0f83e45e454e2215f07e7c5af-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 09:53:25 GMT

artificial intelligence, completion, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia > Taiwan (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders Gang Li

Neural Information Processing SystemsAug-15-2025, 03:06:32 GMT

In particular, we achieve this in two steps.

attention map, information, semantic part, (14 more...)

Neural Information Processing Systems

Country: Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders

Neural Information Processing SystemsOct-11-2024, 05:27:25 GMT

Recently, significant progress has been made in masked image modeling to catch up to masked language modeling. However, unlike words in NLP, the lack of semantic decomposition of images still makes masked autoencoding (MAE) different between vision and language. In this paper, we explore a potential visual analogue of words, i.e., semantic parts, and we integrate semantic information into the training process of MAE by proposing a Semantic-Guided Masking strategy. Compared to widely adopted random masking, our masking strategy can gradually guide the network to learn various information, i.e., from intra-part patterns to inter-part relations. In particular, we achieve this in two steps.

learning masked autoencoder, semantic-guided masking, semmae, (4 more...)

Neural Information Processing Systems

Country: Asia > China > Heilongjiang Province > Daqing (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

SAGE: Bridging Semantic and Actionable Parts for GEneralizable Articulated-Object Manipulation under Language Instructions

Geng, Haoran, Wei, Songlin, Deng, Congyue, Shen, Bokui, Wang, He, Guibas, Leonidas

arXiv.org Artificial IntelligenceDec-3-2023

Generalizable manipulation of articulated objects remains a challenging problem in many real-world scenarios, given the diverse object structures, functionalities, and goals. In these tasks, both semantic interpretations and physical plausibilities are crucial for a policy to succeed. To address this problem, we propose SAGE, a novel framework that bridges the understanding of semantic and actionable parts of articulated objects to achieve generalizable manipulation under language instructions. Given a manipulation goal specified by natural language, an instruction interpreter with Large Language Models (LLMs) first translates them into programmatic actions on the object's semantic parts. This process also involves a scene context parser for understanding the visual inputs, which is designed to generate scene descriptions with both rich information and accurate interaction-related facts by joining the forces of generalist Visual-Language Models (VLMs) and domain-specialist part perception models. To further convert the action programs into executable policies, a part grounding module then maps the object semantic parts suggested by the instruction interpreter into so-called Generalizable Actionable Parts (GAParts). Finally, an interactive feedback module is incorporated to respond to failures, which greatly increases the robustness of the overall framework. Experiments both in simulation environments and on real robots show that our framework can handle a large variety of articulated objects with diverse language-instructed goals. We also provide a new benchmark for language-guided articulated-object manipulation in realistic scenarios.

arxiv preprint arxiv, manipulation, scene description, (14 more...)

arXiv.org Artificial Intelligence

2312.01307

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Does a Neural Network Really Encode Symbolic Concepts?

Li, Mingjie, Zhang, Quanshi

arXiv.org Artificial IntelligenceDec-1-2023

Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.

dataset, dnn, interaction effect, (15 more...)

arXiv.org Artificial Intelligence

2302.1308

Country: